<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<html>
<head>
  <title>
    GMP Development Projects
  </title>
</head>
<body bgcolor=pink>

<center>
  <h1>
    GMP Development Projects
  </h1>
</center>

<font size=-1>
Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br>
This file is part of the GNU MP Library. <br><br>
The GNU MP Library is free software; you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published
by the Free Software Foundation; either version 2.1 of the License, or (at
your option) any later version. <br><br>
The GNU MP Library is distributed in the hope that it will be useful, but
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
License for more details. <br><br>
You should have received a copy of the GNU Lesser General Public License
along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
MA 02111-1307, USA.
</font>

<hr>
<!-- NB. timestamp updated automatically by emacs -->
<comment>
  This file current as of 14 May 2002.  An up-to-date version is available at
  <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
  Please send comments about this page to
  <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
</comment>

<p> This file lists projects suitable for volunteers.  Please see the
    <a href="tasks.html">tasks file</a> for smaller tasks.

<p> If you want to work on any of the projects below, please let tege@swox.com
    know.  If you want to help with a project that already somebody else is
    working on, please talk to that person too, tege@swox.com can put you in
    touch.  (There are no email addresses of volunteers below, due to spamming
    problems.)

<ul>
<li> <strong>Faster multiplication</strong>

  <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
      or Fermat FFT.  Several new developments are desirable:

  <ol>
    <li> Handle multiplication of operands with different digit count
	 better than today.  We now split the operands in a very
	 inefficient way, see mpn/generic/mul.c.

    <li> Consider N-way Toom-Cook.  See Knuth's Seminumerical
	 Algorithms for details on the method.  Code implementing it
	 exists.  This is asymptotically inferior to FFTs, but is finer
	 grained.  A toom-4 might fit in between toom-3 and the FFTs
	 (or it might not).

    <li> It's possible CPU dependent effects like cache locality will
         have a greater impact on speed than algorithmic improvements.

    <li> Add support for partial products, either a given number of low limbs
         or high limbs of the result.  A high partial product can be used by
         <code>mpf_mul</code> now, a low half partial product might be of use
         in a future sub-quadratic REDC.  On small sizes a partial product
         will be faster simply through fewer cross-products, similar to the
         way squaring is faster.  But work by Thom Mulders shows that for
         Karatsuba and higher order algorithms the advantage is progressively
         lost, so for large sizes partial products turn out to be no faster.

  </ol>

  <p> Another possibility would be an optimized cube.  In the basecase that
      should definitely be able to save cross-products in a similar fashion to
      squaring, but some investigation might be needed for how best to adapt
      the higher-order algorithms.  Not sure whether cubing or further small
      powers have any particularly important uses though.

<p> <li> <strong>Assembly routines</strong>

  <p> Write new and improve existing assembly routines.  The tests/devel
  programs and the tune/speed.c and tune/many.pl programs are useful for
  testing and timing the routines you write.  See the README files in those
  directories for more information.

  <p> Please make sure your new routines are fast for these three situations:
  <ol>
    <li> Operands that fit into the cache.
    <li> Small operands of less than, say, 10 limbs.
    <li> Huge operands that does not fit into the cache.
  </ol>

  <p> The most important routines are mpn_addmul_1, mpn_mul_basecase and
  mpn_sqr_basecase.  The latter two don't exist for all machines, while
  mpn_addmul_1 exists for almost all machines.

  <p> Standard techniques for these routines are unrolling, software
  pipelining, and specialization for common operand values.  For machines with
  poor integer multiplication, it is often possible to improve the performance
  using floating-point operations, or SIMD operations such as MMX or Sun's VIS.

  <p> Using floating-point operations is interesting but somewhat tricky.
  Since IEEE double has 53 bit of mantissa, one has to split the operands in
  small prices, so that no result is greater than 2^53.  For 32-bit computers,
  splitting one operand into 16-bit pieces works.  For 64-bit machines, one
  operand can be split into 21-bit pieces and the other into 32-bit pieces.  (A
  64-bit operand can be split into just three 21-bit pieces if one allows the
  split operands to be negative!)


<p> <li> <strong>Faster extended GCD</strong>

  <p> We currently have extended GCD based on Lehmer's method.
  But the binary method can quite easily be improved for bignums
  in a similar way Lehmer improved Euclid's algorithm.  The result should
  beat Lehmer's method by about a factor of 2.  Furthermore, the multiprecision
  step of Lehmer's method and the binary method will be identical, so the
  current code can mostly be reused.  It should be possible to share code
  between GCD and GCDEXT, and probably with Jacobi symbols too.

  <p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems
  that the most likely algorithm doesn't kick in until about 3000 limbs.

<p> <li> <strong>Math functions for the mpf layer</strong>

  <p> Implement the functions of math.h for the GMP mpf layer!  Check the book
  "Pi and the AGM" by Borwein and Borwein for ideas how to do this.  These
  functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
  cosh, exp, log, log10, pow, sin, sinh, tan, tanh.

  <p> These are implemented in mpfr, redoing them in mpf might not be worth
  bothering with, if the long term plan is to bring mpfr in as the new mpf.

<p> <li> <strong>Faster sqrt</strong>

  <p> The current code uses divisions, which are reasonably fast, but it'd be
  possible to use only multiplications by computing 1/sqrt(A) using this
  formula:
  <pre>
                                    2
                   x   = x  (3 - A x )/2.
                    i+1   i         i  </pre>
  And the final result
  <pre>
                     sqrt(A) = A x .
                                  n  </pre>
  That final multiply might be the full size of the input (though it might
  only need the high half of that), so there may or may not be any speedup
  overall.

<p> <li> <strong>Nth root</strong>

  <p> Implement, at the mpn level, a routine for computing the nth root of a
  number.  The routine should use Newton's method, preferably without using
  division.

  <p> If the routine becomes fast enough, perhaps square roots could be computed
  using this function.

<p> <li> <strong>Quotient-Only Division</strong>

  <p> Some work can be saved when only the quotient is required in a division,
      basically the necessary correction -0, -1 or -2 to the estimated
      quotient can almost always be determined from only a few limbs of
      multiply and subtract, rather than forming a complete remainder.  The
      greatest savings are when the quotient is small compared to the dividend
      and divisor.

  <p> Some code along these lines can be found in the current
      <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
      remainder than might be strictly necessary.  That function in its
      current form actually then always goes on to calculate a full remainder.
      Burnikel and Zeigler describe a similar approach for the divide and
      conquer case.

<p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>

  <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
      <code>mpz_powm</code> should use some sort of divide and conquer
      algorithm.  This would benefit <code>mpz_divexact</code>, and
      <code>mpn_gcd</code> on large unequal size operands.  See "Exact
      Division with Karatsuba Complexity" by Jebelean for a (brief)
      description.

  <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
      added to control whether <code>mpz_divexact</code> uses
      <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
      is faster on large divisors.

  <p> For the REDC, basically all that's needed is Montgomery's algorithm done
      in multi-limb integers.  R is multiple limbs, and the inverse and
      operands are multi-precision.

  <p> For exact division the time to calculate a multi-limb inverse is not
      amortized across many modular operations, but instead will probably
      create a threshold below which the current style
      <code>mpn_bdivmod</code> is best.  There's also Krandick and Jebelean,
      "Bidirectional Exact Integer Division" to basically use a low to high
      exact division for the low half quotient, and a quotient-only division
      for the high half.

  <p> It will be noted that low-half and high-half multiplies, and a low-half
      square, can be used.  These ought to each take as little as half the
      time of a full multiply, or square, though work by Thom Mulders shows
      the advantage is progressively lost as Karatsuba and higher algorithms
      are applied.

<p> <li> <strong>Exceptions</strong>

  <p> Some sort of scheme for exceptions handling would be desirable.
      Presently the only thing documented is that divide by zero in GMP
      functions provokes a deliberate machine divide by zero (on those systems
      where such a thing exists at least).  The global <code>gmp_errno</code>
      is not actually documented, except for the old <code>gmp_randinit</code>
      function.  Being currently just a plain global means it's not
      thread-safe.

  <p> The basic choices for exceptions are returning an error code or having
      a handler function to be called.  The disadvantage of error returns is
      they have to be checked, leading to tedious and rarely executed code,
      and strictly speaking such a scheme wouldn't be source or binary
      compatible.  The disadvantage of a handler function is that a
      <code>longjmp</code> or similar recovery from it may be difficult.  A
      combination would be possible, for instance by allowing the handler to
      return an error code.

  <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
      normally be detected at the start of functions, so exception handling
      would have a clean state.  What's worth considering though is that the
      GMP function detecting the exception may have been called via some third
      party library or self contained application module, and hence have
      various bits of state to be cleaned up above it.  It'd be highly
      desirable for an exceptions scheme to allow for such cleanups.

  <p> The C++ destructor mechanism could help with cleanups both internally
      and externally, but being a plain C library we don't want to depend on
      that.

  <p> A C++ <code>throw</code> might be a good optional extra exceptions
      mechanism, perhaps under a build option.  For GCC
      <code>-fexceptions</code> will add the necessary frame information to
      plain C code, or GMP could be compiled as C++.

  <p> Out-of-memory exceptions are expected to be handled by the
      <code>mp_set_memory_functions</code> routines, rather than being a
      prospective part of divide-by-zero etc.  Some similar considerations
      apply but what differs is that out-of-memory can arise deep within GMP
      internals.  Even fundamental routines like <code>mpn_add_n</code> and
      <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
      vector systems).  Allowing for an error code return would require an
      awful lot of checking internally.  Perhaps it'd still be worthwhile, but
      it'd be a lot of changes and the extra code would probably be rather
      rarely executed in normal usages.

  <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
      general, lead to memory leaks and may leave GMP variables operated on in
      inconsistent states.  Maybe it'd be possible to record recovery
      information for use by the relevant allocate or reallocate function, but
      that too would be a lot of changes.

  <p> One scheme for out-of-memory would be to note that all GMP allocations
      go through the <code>mp_set_memory_functions</code> routines.  So if the
      application has an intended <code>setjmp</code> recovery point it can
      record memory activity by GMP and abandon space allocated and variables
      initialized after that point.  This might be as simple as directing the
      allocation functions to a separate pool, but in general would have the
      disadvantage of needing application-level bookkeeping on top of the
      normal system <code>malloc</code>.  An advantage however is that it
      needs nothing from GMP itself and on that basis doesn't burden
      applications not needing recovery.  Note that there's probably some
      details to be worked out here about reallocs of existing variables, and
      perhaps about copying or swapping between "permanent" and "temporary"
      variables.

  <p> Applications desiring a fine-grained error control, for instance a
      language interpreter, would very possibly not be well served by a scheme
      requiring <code>longjmp</code>.  Wrapping every GMP function call with a
      <code>setjmp</code> would be very inconvenient.

  <p> Stack overflow is another possible exception, but perhaps not one that
      can be easily detected in general.  On i386 GNU/Linux for instance GCC
      normally doesn't generate stack probes for an <code>alloca</code>, but
      merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
      miss the stack redzone and hit arbitrary data.  GMP stack usage is
      normally a function of operand size, knowing that might suffice for some
      applications.  Otherwise a fixed maximum usage can probably be obtained
      by building with <code>--enable-alloca=malloc-reentrant</code> (or
      <code>notreentrant</code>).

  <p> Actually recovering from stack overflow is of course another problem.
      It might be possible to catch a <code>SIGSEGV</code> in the stack
      redzone and do something in a <code>sigaltstack</code>, on systems which
      have that, but recovery might otherwise not be possible.  This is worth
      bearing in mind because there's no point worrying about tight and
      careful out-of-memory recovery if an out-of-stack is fatal.


<p> <li> <strong>Test Suite</strong>

  <p> Add a test suite for old bugs.  These tests wouldn't loop or use
      random numbers, but just test for specific bugs found in the
      past.

  <p> More importantly, improve the random number controls for test
      collections:

      <ol>
        <li> Use the new random number framework.
        <li> Have every test accept a seed parameter.
        <li> Allow `make check' to set the seed parameter.
        <li> Add more tests for important, now untested functions.
      </ol>

  <p> With this in place, it should be possible to rid GMP of
      practically all bugs by having some dedicated GMP test machines
      running "make check" with ever increasing seeds (and
      periodically updating to the latest GMP).

  <p> If a few more ASSERTs were sprinkled through the code, running
      some tests with --enable-assert might be useful, though not a
      substitute for tests on the normal build.

  <p> An important feature of any random tests will be to record the
      seeds used, and perhaps to snapshot operands before performing
      each test, so any problem exposed can be reproduced.


<p> <li> <strong>Performance Tool</strong>

  <p> It'd be nice to have some sort of tool for getting an overview of
      performance.  Clearly a great many things could be done, but some
      primary uses would be,

      <ol>
        <li> Checking speed variations between compilers.
        <li> Checking relative performance between systems or CPUs.
      </ol>

  <p> A combination of measuring some fundamental routines and some
      representative application routines might satisfy these.

  <p> The tune/time.c routines would be the easiest way to get good
      accurate measurements on lots of different systems.  The high level
      <code>speed_measure</code> may or may not suit, but the basic
      <code>speed_starttime</code> and <code>speed_endtime</code> would cover
      lots of portability and accuracy questions.


</ul>
<hr>

</body>
</html>

<!--
Local variables:
eval: (add-hook 'write-file-hooks 'time-stamp)
time-stamp-start: "This file current as of "
time-stamp-format: "%:d %3b %:y"
time-stamp-end: "\\."
time-stamp-line-limit: 50
End:
-->
